Olympic Data

1 Import the data

  NOC Year Decade     ID First.Name                   Name Last.Name Sex Age
1 AFG 1960  1960s  59346   Mohammad   Mohammad Asif Khokan    Khokan   M  24
2 AFG 1960  1960s  59043       Faiz Faiz Mohammad Khakshar  Khakshar   M  18
3 AFG 1960  1960s 109486      Abdul     Abdul Hadi Shekaib   Shekaib   M  20
  Height Weight      BMI BMI.Category        Team Population       GDP    GDPpC
1    171     78 26.67487            3 Afghanistan    8996973 537777800 59.77319
2    162     52 19.81405            0 Afghanistan    8996973 537777800 59.77319
3    178     68 21.46194            2 Afghanistan    8996973 537777800 59.77319
        Games Season City     Sport                                   Event
1 1960 Summer Summer Roma Wrestling Wrestling Men's Middleweight, Freestyle
2 1960 Summer Summer Roma Wrestling    Wrestling Men's Flyweight, Freestyle
3 1960 Summer Summer Roma Athletics              Athletics Men's 100 metres
     Medal Medal.No.Yes
1 No Medal            0
2 No Medal            0
3 No Medal            0
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
'data.frame':   151977 obs. of  24 variables:
 $ NOC         : Factor w/ 122 levels "AFG","ALB","AND",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year        : int  1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ...
 $ Decade      : Factor w/ 6 levels "1960s","1970s",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID          : int  59346 59043 109486 59102 128736 29626 39922 106372 128736 58364 ...
 $ First.Name  : Factor w/ 14118 levels "","A","A.","Aadam",..: 8716 3731 64 599 64 11978 64 4634 64 8716 ...
 $ Name        : Factor w/ 74268 levels "  Gabrielle Marie \"Gabby\" Adcock (White-)",..: 48941 19066 218 3341 220 64832 215 23793 220 48946 ...
 $ Last.Name   : Factor w/ 47370 levels "","-)","-Alard)",..: 23228 23112 38893 23137 44908 13260 16633 37860 44908 22890 ...
 $ Sex         : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
 $ Age         : int  24 18 20 35 20 28 22 23 20 20 ...
 $ Height      : int  171 162 178 166 179 168 172 170 179 166 ...
 $ Weight      : num  78 52 68 66 75 73 70 58 75 62 ...
 $ BMI         : num  26.7 19.8 21.5 24 23.4 ...
 $ BMI.Category: Factor w/ 5 levels "0","1","2","3",..: 4 1 3 3 3 4 3 3 3 3 ...
 $ Team        : Factor w/ 332 levels "Acipactli","Afghanistan",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Population  : int  8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 8996973 ...
 $ GDP         : num  5.38e+08 5.38e+08 5.38e+08 5.38e+08 5.38e+08 ...
 $ GDPpC       : num  59.8 59.8 59.8 59.8 59.8 ...
 $ Games       : Factor w/ 30 levels "1960 Summer",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Season      : Factor w/ 2 levels "Summer","Winter": 1 1 1 1 1 1 1 1 1 1 ...
 $ City        : Factor w/ 29 levels "Albertville",..: 19 19 19 19 19 19 19 19 19 19 ...
 $ Sport       : Factor w/ 51 levels "Alpine Skiing",..: 51 51 3 51 3 51 3 3 3 51 ...
 $ Event       : Factor w/ 489 levels "Alpine Skiing Men's Combined",..: 478 468 17 476 33 482 22 24 18 466 ...
 $ Medal       : Factor w/ 4 levels "Bronze","Gold",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Medal.No.Yes: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

Number of sports per year at the https://www.topendsports.com/events/summer/sports/number.htm1

There is clearly an upward trend, but no seasonal pattern. The data is also a little choppy at the beginning. Part of the explanation is that the data points are not evenly spaced. Most Olympic games are 4 years apart, but a few of them are just 2 years apart, and during World War I and World War II there were 8-year and 12-year gaps, respectively. Since time series data should be evenly spaced over time, we’ll only look at data from 1948 on, when the Olympics started being held every 4 years without any interruptions.

2 Creating models

I’m going to try 4 different models.

\[ y_{\text{linear}}(x) = ax+b \\ y_{\text{quadratic}}(x) = ax^2 + bx + c \\ y_{\text{exponential}}(x) = a\exp(bx) + c \\ y_{\text{cubic}}(x) = ax^3 + bx^2 + cx + d \]

And I’ll be able to use ANOVA to test the nested models: linear vs quadratic, and exponential growth vs s-curve (sigmoid).

Now I will try the model fits on the number of events per Olympic Games data.

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 17 33.33158 NA NA NA NA
Exponential 16 32.28073 1 1.050847 0.5208539 0.4808924
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 17 33.33158 NA NA NA NA
Quadratic 16 30.41244 1 2.919136 1.5357588 0.2331213
Cubic 15 29.04971 1 1.362737 0.7036577 0.4147260

3 Sports

3.1 Top 10 Sports

  Year Mean_Weight StdDev_Weight Mean_Height StdDev_Height    Sport Sex
1 1924    64.00000      0.000000    167.0000      0.000000 Swimming   F
2 1956    61.00000      4.780914    169.7333      3.634491 Swimming   F
3 1960    62.73469      5.619073    169.3469      6.839076 Swimming   F
4 1964    63.06000      6.466270    171.3600      4.378799 Swimming   F
5 1968    62.45455      5.361348    170.3636      4.583033 Swimming   F
6 1972    60.23611      5.491333    170.3889      4.949194 Swimming   F
'data.frame':   339 obs. of  7 variables:
 $ Year         : int  1924 1956 1960 1964 1968 1972 1976 1980 1984 1988 ...
 $ Mean_Weight  : num  64 61 62.7 63.1 62.5 ...
 $ StdDev_Weight: num  0 4.78 5.62 6.47 5.36 ...
 $ Mean_Height  : num  167 170 169 171 170 ...
 $ StdDev_Height: num  0 3.63 6.84 4.38 4.58 ...
 $ Sport        : Factor w/ 10 levels "Basketball","Canoeing",..: 9 9 9 9 9 9 9 9 9 9 ...
 $ Sex          : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...

3.2 Medalists

Medal mean
Bronze 25.55859
Gold 25.28269
No Medal 24.93049
Silver 25.48383
# A tibble: 6 x 3
# Groups:   Year [3]
   Year Sex   mean.Age
  <int> <fct>    <dbl>
1  1960 F         21.6
2  1960 M         26.0
3  1964 F         21.5
4  1964 M         25.7
5  1968 F         20.5
6  1968 M         25.1

3.3 Swimming

3.3.1 Female Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 8.508555 NA NA NA NA
Exponential 12 8.161454 1 0.3471001 0.5103504 0.4886515
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 8.5085546 NA NA NA NA
Quadratic 12 3.3398817 1 5.168673 18.57074 0.0010150
Cubic 11 0.6470143 1 2.692867 45.78190 0.0000309
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 11.69717 NA NA NA NA
Exponential 12 11.34729 1 0.3498782 0.3700035 0.5543414
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 11.697173 NA NA NA NA
Quadratic 12 5.842805 1 5.854368 12.02375 0.0046521
Cubic 11 2.432617 1 3.410188 15.42046 0.0023647
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 4.326164 NA NA NA NA
Exponential 12 4.408289 1 -0.0821246 -0.223555 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 4.326164 NA NA NA NA
Quadratic 12 4.163567 1 0.162597 0.4686279 0.5066258
Cubic 11 2.732882 1 1.430686 5.7585885 0.0352509

3.3.2 Male Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 3.720714 NA NA NA NA
Exponential 12 3.559721 1 0.1609931 0.5427161 0.475466
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 3.7207136 NA NA NA NA
Quadratic 12 1.8313700 1 1.889344 12.37987 0.0042351
Cubic 11 0.4074947 1 1.423875 38.43639 0.0000672
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 13.00892 NA NA NA NA
Exponential 12 12.95885 1 0.0500719 0.046367 0.8331264
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 13.008921 NA NA NA NA
Quadratic 12 12.927824 1 0.0810975 0.0752771 0.7884689
Cubic 11 2.535541 1 10.3922832 45.0851030 0.0000331
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 10.26588 NA NA NA NA
Exponential 12 10.76283 1 -0.4969448 -0.5540679 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 10.265883 NA NA NA NA
Quadratic 12 4.739344 1 5.526539 13.993175 0.0028181
Cubic 11 3.544380 1 1.194964 3.708575 0.0803623

3.4 Athletics

3.4.1 Female Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 4.210569 NA NA NA NA
Exponential 12 4.285409 1 -0.0748399 -0.2095666 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 4.210569 NA NA NA NA
Quadratic 12 3.883503 1 0.3270661 1.010632 0.3345930
Cubic 11 1.057545 1 2.8259583 29.394066 0.0002097
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 24.06089 NA NA NA NA
Exponential 12 24.02365 1 0.0372406 0.018602 0.8937751
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 24.06089 NA NA NA NA
Quadratic 12 17.52401 1 6.536879 4.476290 0.0559590
Cubic 11 13.83787 1 3.686136 2.930182 0.1149498
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 8.095579 NA NA NA NA
Exponential 12 8.097702 1 -0.0021235 -0.0031468 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 8.095579 NA NA NA NA
Quadratic 12 8.085980 1 0.0095987 0.0142450 0.9069711
Cubic 11 7.974949 1 0.1110312 0.1531475 0.7030196

3.4.2 Male Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 0.9457009 NA NA NA NA
Exponential 12 0.9332067 1 0.0124942 0.1606617 0.695593
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 0.9457009 NA NA NA NA
Quadratic 12 0.7883551 1 0.1573458 2.395050 0.1476766
Cubic 11 0.6019224 1 0.1864327 3.407016 0.0919830
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 20.30145 NA NA NA NA
Exponential 12 20.31487 1 -0.0134131 -0.0079231 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 20.30145 NA NA NA NA
Quadratic 12 20.29326 1 0.0081908 0.0048435 0.9456622
Cubic 11 13.12999 1 7.1632736 6.0012249 0.0322607
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 3.276112 NA NA NA NA
Exponential 12 3.329318 1 -0.0532056 -0.191771 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 3.276112 NA NA NA NA
Quadratic 12 2.659483 1 0.6166295 2.782328 0.1211719
Cubic 11 2.106053 1 0.5534296 2.890585 0.1171633

3.5 Gymnastics

3.5.1 Female Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 35.59138 NA NA NA NA
Exponential 12 35.36262 1 0.2287617 0.0776283 0.7852807
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 35.591383 NA NA NA NA
Quadratic 12 4.159030 1 31.4323530 90.691399 0.0000006
Cubic 11 3.733688 1 0.4253421 1.253121 0.2868053

3.5.2 Male Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 14.16465 NA NA NA NA
Exponential 12 14.07826 1 0.0863857 0.0736333 0.79073
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 14.164646 NA NA NA NA
Quadratic 12 2.872336 1 11.292310 47.176841 0.0000173
Cubic 11 1.627689 1 1.244646 8.411377 0.0144390

3.6 Rowing

3.6.1 Female Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 4.149078 NA NA NA NA
Exponential 8 4.234549 1 -0.0854708 -0.1614733 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 4.149078 NA NA NA NA
Quadratic 8 2.336048 1 1.8130300 6.208878 0.0374193
Cubic 7 1.559452 1 0.7765967 3.485954 0.1041248
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 15.83480 NA NA NA NA
Exponential 8 16.00342 1 -0.1686178 -0.0842909 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 15.834804 NA NA NA NA
Quadratic 8 2.320446 1 13.5143582 46.592285 0.0001342
Cubic 7 1.926398 1 0.3940476 1.431861 0.2704117
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 9.424177 NA NA NA NA
Exponential 8 9.656112 1 -0.2319344 -0.1921556 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 9.424177 NA NA NA NA
Quadratic 8 1.104136 1 8.3200410 60.2827063 0.0000541
Cubic 7 1.008614 1 0.0955226 0.6629479 0.4423346

3.6.2 Male Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 1.923962 NA NA NA NA
Exponential 12 1.872159 1 0.0518021 0.3320367 0.5751109
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 1.923962 NA NA NA NA
Quadratic 12 1.510689 1 0.4132727 3.2827891 0.0950925
Cubic 11 1.462089 1 0.0485994 0.3656368 0.5576592
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 19.88407 NA NA NA NA
Exponential 12 20.63138 1 -0.7473091 -0.4346635 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 19.884073 NA NA NA NA
Quadratic 12 5.894874 1 13.9891993 28.477351 0.0001775
Cubic 11 5.256667 1 0.6382065 1.335498 0.2723140
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 7.273221 NA NA NA NA
Exponential 12 7.634802 1 -0.3615807 -0.5683144 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 7.273221 NA NA NA NA
Quadratic 12 2.012381 1 5.260840 31.3708479 0.0001160
Cubic 11 1.845598 1 0.166783 0.9940485 0.3401819

3.7 Basketball

3.7.1 Female Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 4.141973 NA NA NA NA
Exponential 8 4.189668 1 -0.0476956 -0.0910728 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 4.141973 NA NA NA NA
Quadratic 8 3.810224 1 0.3317491 0.6965451 0.4281641
Cubic 7 3.797734 1 0.0124897 0.0230211 0.8836825
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 17.87967 NA NA NA NA
Exponential 8 18.09276 1 -0.213091 -0.0942215 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 17.87967 NA NA NA NA
Quadratic 8 14.74145 1 3.138222 1.703074 0.2281701
Cubic 7 12.24426 1 2.497190 1.427635 0.2710586
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 18.31425 NA NA NA NA
Exponential 8 18.54528 1 -0.2310301 -0.099661 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 9 18.314246 NA NA NA NA
Quadratic 8 10.723460 1 7.590785 5.662937 0.0445670
Cubic 7 8.311638 1 2.411823 2.031219 0.1971233

3.7.2 Male Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 4.943144 NA NA NA NA
Exponential 12 4.922646 1 0.0204985 0.0499694 0.8268768
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 4.943144 NA NA NA NA
Quadratic 12 4.892514 1 0.0506298 0.1241811 0.7306549
Cubic 11 4.335635 1 0.5568795 1.4128668 0.2596093
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 78.62687 NA NA NA NA
Exponential 12 80.66841 1 -2.041543 -0.3036941 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 78.62687 NA NA NA NA
Quadratic 12 66.59475 1 12.03212 2.16812 0.1666358
Cubic 11 30.06034 1 36.53441 13.36906 0.0037784
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 25.76692 NA NA NA NA
Exponential 12 26.84673 1 -1.079811 -0.482656 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 25.766919 NA NA NA NA
Quadratic 12 8.389214 1 17.3777050 24.8572104 0.0003168
Cubic 11 7.934290 1 0.4549246 0.6307017 0.4438938

3.8 Fencing

3.8.1 Female Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 12.21269 NA NA NA NA
Exponential 12 12.21715 1 -0.0044544 -0.0043752 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 12.212691 NA NA NA NA
Quadratic 12 5.825501 1 6.3871898 13.1570266 0.0034681
Cubic 11 5.491448 1 0.3340534 0.6691473 0.4307124
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 17.00640 NA NA NA NA
Exponential 12 16.74273 1 0.2636671 0.1889778 0.6714834
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 17.00640 NA NA NA NA
Quadratic 12 12.67503 1 4.331365 4.100690 0.0657024
Cubic 11 11.45006 1 1.224974 1.176824 0.3012066
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 16.51064 NA NA NA NA
Exponential 12 16.66480 1 -0.1541545 -0.1110037 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 16.51064 NA NA NA NA
Quadratic 12 15.77139 1 0.7392517 0.5624755 0.4677146
Cubic 11 10.19087 1 5.5805213 6.0236018 0.0320006

3.8.2 Male Athletes

Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 15.32277 NA NA NA NA
Exponential 12 15.33475 1 -0.011983 -0.0093771 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 15.322770 NA NA NA NA
Quadratic 12 8.207974 1 7.114796 10.401781 0.0072842
Cubic 11 7.147501 1 1.060473 1.632067 0.2277214
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 14.71278 NA NA NA NA
Exponential 12 14.86734 1 -0.1545598 -0.1247511 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 14.71278 NA NA NA NA
Quadratic 12 14.08288 1 0.6299021 0.5367385 0.4778544
Cubic 11 11.92883 1 2.1540485 1.9863245 0.1863646
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 5.329521 NA NA NA NA
Exponential 12 5.537634 1 -0.2081129 -0.4509788 1
Models Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
Linear 13 5.329521 NA NA NA NA
Quadratic 12 3.072275 1 2.2572456 8.8165764 0.0117170
Cubic 11 2.869575 1 0.2027005 0.7770159 0.3969078

References

1 R. Wood, (2010).

Izzy Illari

20 April, 2020